Overview

Dataset statistics

Number of variables9
Number of observations1030
Missing cells0
Missing cells (%)0.0%
Duplicate rows25
Duplicate rows (%)2.4%
Total size in memory72.5 KiB
Average record size in memory72.1 B

Variable types

NUM9

Reproduction

Analysis started2020-10-30 23:41:18.432079
Analysis finished2020-10-30 23:41:31.388782
Duration12.96 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 25 (2.4%) duplicate rows Duplicates
slag has 471 (45.7%) zeros Zeros
ash has 566 (55.0%) zeros Zeros
superplastic has 379 (36.8%) zeros Zeros

Variables

cement
Real number (ℝ≥0)

Distinct count278
Unique (%)27.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean281.16786407766995
Minimum102.0
Maximum540.0
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB
2020-10-30T18:41:31.445314image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum102
5-th percentile143.745
Q1192.375
median272.9
Q3350
95-th percentile480
Maximum540
Range438
Interquartile range (IQR)157.625

Descriptive statistics

Standard deviation104.5063645
Coefficient of variation (CV)0.3716867318
Kurtosis-0.5206522845
Mean281.1678641
Median Absolute Deviation (MAD)79.4
Skewness0.5094811789
Sum289602.9
Variance10921.58022
2020-10-30T18:41:31.525158image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
425201.9%
 
362.6201.9%
 
251.4151.5%
 
446141.4%
 
310141.4%
 
475131.3%
 
331131.3%
 
250131.3%
 
387121.2%
 
349121.2%
 
Other values (268)88485.8%
 
ValueCountFrequency (%) 
10240.4%
 
108.340.4%
 
11640.4%
 
122.640.4%
 
13220.2%
 
ValueCountFrequency (%) 
54090.9%
 
531.350.5%
 
52810.1%
 
52570.7%
 
52220.2%
 

slag
Real number (ℝ≥0)

ZEROS

Distinct count185
Unique (%)18.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73.89582524271844
Minimum0.0
Maximum359.4
Zeros471
Zeros (%)45.7%
Memory size8.0 KiB
2020-10-30T18:41:31.607565image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median22
Q3142.95
95-th percentile236
Maximum359.4
Range359.4
Interquartile range (IQR)142.95

Descriptive statistics

Standard deviation86.27934175
Coefficient of variation (CV)1.167580732
Kurtosis-0.5081754789
Mean73.89582524
Median Absolute Deviation (MAD)22
Skewness0.8007168956
Sum76112.7
Variance7444.124812
2020-10-30T18:41:31.693711image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
047145.7%
 
189302.9%
 
106.3201.9%
 
24141.4%
 
20121.2%
 
145111.1%
 
19101.0%
 
98.1101.0%
 
2280.8%
 
2680.8%
 
Other values (175)43642.3%
 
ValueCountFrequency (%) 
047145.7%
 
1140.4%
 
13.650.5%
 
1550.5%
 
17.210.1%
 
ValueCountFrequency (%) 
359.420.2%
 
342.120.2%
 
316.120.2%
 
305.340.4%
 
290.220.2%
 

ash
Real number (ℝ≥0)

ZEROS

Distinct count156
Unique (%)15.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.18834951456311
Minimum0.0
Maximum200.1
Zeros566
Zeros (%)55.0%
Memory size8.0 KiB
2020-10-30T18:41:31.780993image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3118.3
95-th percentile167
Maximum200.1
Range200.1
Interquartile range (IQR)118.3

Descriptive statistics

Standard deviation63.99700415
Coefficient of variation (CV)1.181010397
Kurtosis-1.328746435
Mean54.18834951
Median Absolute Deviation (MAD)0
Skewness0.5373539058
Sum55814
Variance4095.616541
2020-10-30T18:41:31.863845image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
056655.0%
 
118.3201.9%
 
141161.6%
 
24.5151.5%
 
79141.4%
 
94131.3%
 
100.4111.1%
 
100.5101.0%
 
98.8101.0%
 
174.2101.0%
 
Other values (146)34533.5%
 
ValueCountFrequency (%) 
056655.0%
 
24.5151.5%
 
5910.1%
 
6010.1%
 
7110.1%
 
ValueCountFrequency (%) 
200.110.1%
 
20010.1%
 
19530.3%
 
194.910.1%
 
19410.1%
 

water
Real number (ℝ≥0)

Distinct count195
Unique (%)18.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean181.56728155339806
Minimum121.8
Maximum247.0
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB
2020-10-30T18:41:31.952571image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum121.8
5-th percentile146.1
Q1164.9
median185
Q3192
95-th percentile228
Maximum247
Range125.2
Interquartile range (IQR)27.1

Descriptive statistics

Standard deviation21.35421857
Coefficient of variation (CV)0.1176104989
Kurtosis0.1220816744
Mean181.5672816
Median Absolute Deviation (MAD)13
Skewness0.07462838429
Sum187014.3
Variance456.0026505
2020-10-30T18:41:32.034413image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
19211811.5%
 
228545.2%
 
185.7464.5%
 
203.5363.5%
 
186282.7%
 
162201.9%
 
164.9201.9%
 
153.5151.5%
 
185151.5%
 
178141.4%
 
Other values (185)66464.5%
 
ValueCountFrequency (%) 
121.850.5%
 
126.650.5%
 
12710.1%
 
127.310.1%
 
137.850.5%
 
ValueCountFrequency (%) 
24710.1%
 
246.910.1%
 
23710.1%
 
236.710.1%
 
228545.2%
 

superplastic
Real number (ℝ≥0)

ZEROS

Distinct count111
Unique (%)10.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.204660194174758
Minimum0.0
Maximum32.2
Zeros379
Zeros (%)36.8%
Memory size8.0 KiB
2020-10-30T18:41:32.119317image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median6.4
Q310.2
95-th percentile16.055
Maximum32.2
Range32.2
Interquartile range (IQR)10.2

Descriptive statistics

Standard deviation5.973841392
Coefficient of variation (CV)0.9627991228
Kurtosis1.411268965
Mean6.204660194
Median Absolute Deviation (MAD)5.3
Skewness0.9072025749
Sum6390.8
Variance35.68678098
2020-10-30T18:41:32.207747image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
037936.8%
 
11.6373.6%
 
8272.6%
 
7191.8%
 
6171.7%
 
9161.6%
 
8.9161.6%
 
7.8161.6%
 
9.9161.6%
 
10151.5%
 
Other values (101)47245.8%
 
ValueCountFrequency (%) 
037936.8%
 
1.740.4%
 
1.910.1%
 
210.1%
 
2.210.1%
 
ValueCountFrequency (%) 
32.250.5%
 
28.250.5%
 
23.450.5%
 
22.110.1%
 
2260.6%
 

coarseagg
Real number (ℝ≥0)

Distinct count284
Unique (%)27.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean972.9189320388349
Minimum801.0
Maximum1145.0
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB
2020-10-30T18:41:32.294298image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum801
5-th percentile842
Q1932
median968
Q31029.4
95-th percentile1104
Maximum1145
Range344
Interquartile range (IQR)97.4

Descriptive statistics

Standard deviation77.75395397
Coefficient of variation (CV)0.07991822485
Kurtosis-0.5990161032
Mean972.918932
Median Absolute Deviation (MAD)46.3
Skewness-0.04021974481
Sum1002106.5
Variance6045.677357
2020-10-30T18:41:32.498383image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
932575.5%
 
852.1454.4%
 
944.7302.9%
 
968292.8%
 
1125242.3%
 
967191.8%
 
1047191.8%
 
974121.2%
 
942121.2%
 
822121.2%
 
Other values (274)77174.9%
 
ValueCountFrequency (%) 
80140.4%
 
801.110.1%
 
801.410.1%
 
81120.2%
 
81410.1%
 
ValueCountFrequency (%) 
114510.1%
 
1134.350.5%
 
113010.1%
 
1125242.3%
 
1124.420.2%
 

fineagg
Real number (ℝ≥0)

Distinct count302
Unique (%)29.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean773.5804854368931
Minimum594.0
Maximum992.6
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB
2020-10-30T18:41:32.589542image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum594
5-th percentile613
Q1730.95
median779.5
Q3824
95-th percentile898.09
Maximum992.6
Range398.6
Interquartile range (IQR)93.05

Descriptive statistics

Standard deviation80.17598014
Coefficient of variation (CV)0.1036427129
Kurtosis-0.1021769893
Mean773.5804854
Median Absolute Deviation (MAD)45.5
Skewness-0.2530095977
Sum796787.9
Variance6428.187792
2020-10-30T18:41:32.673890image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
594302.9%
 
755.8302.9%
 
670232.2%
 
613222.1%
 
801161.6%
 
887.1151.5%
 
746.6151.5%
 
845141.4%
 
712141.4%
 
750121.2%
 
Other values (292)83981.5%
 
ValueCountFrequency (%) 
594302.9%
 
60550.5%
 
611.850.5%
 
61210.1%
 
613222.1%
 
ValueCountFrequency (%) 
992.650.5%
 
94540.4%
 
943.140.4%
 
94240.4%
 
925.750.5%
 

age
Real number (ℝ≥0)

Distinct count14
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45.662135922330094
Minimum1
Maximum365
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB
2020-10-30T18:41:32.761756image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q17
median28
Q356
95-th percentile180
Maximum365
Range364
Interquartile range (IQR)49

Descriptive statistics

Standard deviation63.16991158
Coefficient of variation (CV)1.383419989
Kurtosis12.16898898
Mean45.66213592
Median Absolute Deviation (MAD)21
Skewness3.269177401
Sum47032
Variance3990.437729
2020-10-30T18:41:32.845043image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2842541.3%
 
313413.0%
 
712612.2%
 
56918.8%
 
14626.0%
 
90545.2%
 
100525.0%
 
180262.5%
 
91222.1%
 
365141.4%
 
Other values (4)242.3%
 
ValueCountFrequency (%) 
120.2%
 
313413.0%
 
712612.2%
 
14626.0%
 
2842541.3%
 
ValueCountFrequency (%) 
365141.4%
 
36060.6%
 
270131.3%
 
180262.5%
 
12030.3%
 

strength
Real number (ℝ≥0)

Distinct count845
Unique (%)82.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.817961165048544
Minimum2.33
Maximum82.6
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB
2020-10-30T18:41:32.932272image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum2.33
5-th percentile10.961
Q123.71
median34.445
Q346.135
95-th percentile66.802
Maximum82.6
Range80.27
Interquartile range (IQR)22.425

Descriptive statistics

Standard deviation16.70574196
Coefficient of variation (CV)0.4664068366
Kurtosis-0.3137248604
Mean35.81796117
Median Absolute Deviation (MAD)10.93
Skewness0.4169772884
Sum36892.5
Variance279.0818145
2020-10-30T18:41:33.024820image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
33.460.6%
 
79.340.4%
 
41.0540.4%
 
71.340.4%
 
35.340.4%
 
23.5240.4%
 
31.3540.4%
 
77.340.4%
 
37.2730.3%
 
55.930.3%
 
Other values (835)99096.1%
 
ValueCountFrequency (%) 
2.3310.1%
 
3.3210.1%
 
4.5710.1%
 
4.7810.1%
 
4.8310.1%
 
ValueCountFrequency (%) 
82.610.1%
 
81.7510.1%
 
80.210.1%
 
79.9910.1%
 
79.410.1%
 

Interactions

2020-10-30T18:41:21.845425image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:21.949013image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:22.049383image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:22.152202image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:22.250429image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:22.348924image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:22.453582image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:22.558298image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:22.658428image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:22.761254image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:22.864911image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:22.975765image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:23.089106image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:23.197952image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:23.306114image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:23.419520image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:23.535697image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:23.646062image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:23.760173image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:23.868763image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:23.981896image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:24.097720image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:24.209539image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:24.321260image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:24.528791image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:24.647742image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:24.760405image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:24.877004image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:24.977756image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:25.086064image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:25.195494image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:25.300768image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:25.405942image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:25.517432image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:25.629211image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:25.736050image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:25.846217image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:25.945429image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:26.053880image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:26.162543image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:26.270277image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:26.375792image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:26.487124image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:26.598659image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:26.706196image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:26.818127image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:26.923982image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:27.038026image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:27.150563image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:27.262570image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:27.372387image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:27.488102image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:27.603939image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:27.716120image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:27.830405image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:27.938490image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:28.053816image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:28.262786image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:28.376616image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:28.492034image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:28.610230image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:28.729532image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:28.845005image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:28.963536image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:29.066714image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:29.177032image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:29.291575image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:29.400985image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:29.512375image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:29.624832image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:29.739504image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:29.851211image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:29.964877image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:30.071855image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:30.186128image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:30.301282image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:30.413212image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:30.527589image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:30.643642image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:30.760301image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:30.872965image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-10-30T18:41:33.125730image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-10-30T18:41:33.275277image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-10-30T18:41:33.424560image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-10-30T18:41:33.577806image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-10-30T18:41:31.084070image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-30T18:41:31.270434image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

cementslagashwatersuperplasticcoarseaggfineaggagestrength
0141.3212.00.0203.50.0971.8748.52829.89
1168.942.2124.3158.310.81080.8796.21423.51
2250.00.095.7187.45.5956.9861.22829.22
3266.0114.00.0228.00.0932.0670.02845.85
4154.8183.40.0193.39.11047.4696.72818.29
5255.00.00.0192.00.0889.8945.09021.86
6166.8250.20.0203.50.0975.6692.6715.75
7251.40.0118.3188.56.41028.4757.75636.64
8296.00.00.0192.00.01085.0765.02821.65
9155.0184.0143.0194.09.0880.0699.02828.99

Last rows

cementslagashwatersuperplasticcoarseaggfineaggagestrength
1020183.9122.60.0203.50.0959.2800.0710.79
1021203.5305.30.0203.50.0963.4630.039.56
1022144.80.0133.6180.811.1979.5811.52813.20
1023141.3212.00.0203.50.0971.8748.5710.39
1024297.20.0117.5174.89.51022.8753.5321.91
1025135.00.0166.0180.010.0961.0805.02813.29
1026531.30.00.0141.828.2852.1893.7341.30
1027276.4116.090.3179.68.9870.1768.32844.28
1028342.038.00.0228.00.0932.0670.027055.06
1029540.00.00.0173.00.01125.0613.0752.61

Duplicate rows

Most frequent

cementslagashwatersuperplasticcoarseaggfineaggagestrengthcount
1362.6189.00.0164.911.6944.7755.8335.304
3362.6189.00.0164.911.6944.7755.82871.304
4362.6189.00.0164.911.6944.7755.85677.304
5362.6189.00.0164.911.6944.7755.89179.304
2362.6189.00.0164.911.6944.7755.8755.903
6425.0106.30.0153.516.5852.1887.1333.403
7425.0106.30.0153.516.5852.1887.1749.203
8425.0106.30.0153.516.5852.1887.12860.293
9425.0106.30.0153.516.5852.1887.15664.303
10425.0106.30.0153.516.5852.1887.19165.203